Programming environment

Download raw data

This study training data

Table S1. Raw data list
name size modified_date id
Preprocess.R 2.6000e+03 06/02/2024 10:43 PM syn60236613
Sample_annotation.csv 8.9500e+04 05/30/2024 8:37 AM syn60157686
Probe_array.csv 7.3000e+07 05/30/2024 9:04 AM syn60157718
Probe_annotation.csv 5.6870e+08 05/30/2024 8:49 AM syn60157694
DetectionP_subchallenge1.csv 2.7340e+09 05/24/2024 5:35 AM syn59870646
DetectionP_subchallenge2.csv 4.9870e+09 05/24/2024 5:51 AM syn59872208
Beta_raw_subchallenge1.csv.gz 5.8690e+09 05/24/2024 5:19 AM syn59868755
Beta_raw_subchallenge2.csv.gz 1.1008e+10 05/24/2024 2:19 PM syn59898399

Download original series for understanding the data generation process

Load raw data

Table S3. This study training data list
GEO Number N
GSE128827 5
GSE228149 5
GSE200659 11
E_MTAB_9312 13
GSE74738 13
GSE108567 16
GSE75196 24
GSE115508 25
GSE98224 48
GSE69502 52
GSE204977 55
GSE169598 64
GSE100197 95
GSE232778 187
GSE144129 210
GSE167885 242
GSE75248 334
GSE71678 343

Methylation data preprocessing

Probe filtering

Probe normalization

Phenotype preprocessing

Phenotype preprocessing for conditions

Phenotype imputation for conditions

FGR imputation

PE imputation

PE onset imputation

HELLP imputation

Diandric triploid imputation

Miscarriage imputation

Preterm imputation

GDM imputation

SGA imputation

LGA imputation

SLGA imputation

IVF imputation

Subfertility imputation

Chorioamnionitis imputation

Apply imputation

Phenotype preprocessing for non-conditions

Predictive modeling

Phenotype correlation matrix among conditions and gestational age

Table S7. Categorical variables with pair-wise perfect separation.
V1 V2
anencephaly chorioamnionitis
anencephaly diandric_triploid
anencephaly gdm
anencephaly hellp
anencephaly ivf
anencephaly lga
anencephaly miscarriage
anencephaly spina_bifida
anencephaly subfertility
diandric_triploid chorioamnionitis
diandric_triploid hellp
diandric_triploid ivf
diandric_triploid lga
diandric_triploid sga
diandric_triploid subfertility
fgr anencephaly
fgr ivf
fgr subfertility
gdm diandric_triploid
ivf chorioamnionitis
ivf hellp
ivf subfertility
miscarriage chorioamnionitis
miscarriage hellp
miscarriage ivf
miscarriage sga
miscarriage subfertility
ms_ivf anencephaly
ms_ivf diandric_triploid
ms_ivf ivf
ms_ivf miscarriage
ms_ivf ms_subfertility
ms_ivf spina_bifida
ms_ivf subfertility
ms_subfertility chorioamnionitis
ms_subfertility hellp
ms_subfertility ivf
ms_subfertility subfertility
pe anencephaly
pe diandric_triploid
pe hellp
pe pe_onset
pe spina_bifida
pe_onset anencephaly
pe_onset diandric_triploid
pe_onset hellp
pe_onset ivf
pe_onset miscarriage
pe_onset spina_bifida
pe_onset subfertility
preterm diandric_triploid
preterm ivf
preterm miscarriage
preterm subfertility
sga lga
spina_bifida chorioamnionitis
spina_bifida diandric_triploid
spina_bifida gdm
spina_bifida hellp
spina_bifida ivf
spina_bifida lga
spina_bifida miscarriage
spina_bifida subfertility
subfertility chorioamnionitis
subfertility hellp

## $fgr

## 
## $pe

## 
## $pe_onset

## 
## $preterm

## 
## $anencephaly

## 
## $spina_bifida

## 
## $gdm

## 
## $diandric_triploid

## 
## $miscarriage

## 
## $lga

## 
## $subfertility

## 
## $hellp

## 
## $chorioamnionitis

## $ivf

## 
## $subfertility

Model development

GA prediction

Normal-GA model was trained using samples without 12 of 13 available conditions. They were significantly correlated to GA: (1) fetal growth restriction (FGR); (2) PE; (3) PE onset (early/late/not applicable); (4) hemolysis, elevated liver enzyme, and low platelet (HELLP) syndrome; (5) anencephaly; (6) spina bifida; (7) diandric triploid; (8) miscarriage; (9) preterm delivery; (10) gestational diabetes mellitus (GDM); (11) large-for-gestational-age (LGA) infant; (12) subfertility; and (13) chorioamnionitis. We excluded preterm delivery because it was related to the outcome, i.e., GA, simply by definition.

FGR prediction

PE prediction

PE onset prediction

HELLP prediction

Anencephaly prediction

Spina bifida prediction

Diandric triploid prediction

Miscarriage prediction

Preterm prediction

GDM prediction

LGA prediction

Subfertility prediction

Chorioamnionitis prediction

GA res-full prediction

GA res-full random forest

GA res-conds prediction

Res-Conds-GA model was similar to Resfull-GA model but we used predictors of multiplication values for each predicted probability and residual GA estimated by a model for the corresponding condition. Specifically, we trained a model using beta values of DMPs among samples with a condition. The rationale was that the conditions have different trajectories of when pregnancies are terminated and each pregnant woman has a different set of probabilities of the conditions. We used the predicted probabilities of the conditions from the prediction (Resfull-GA) model for the conditions.

GA res-comb prediction

Res-Comb- model was considered because other conditions might affect pregnancy termination, not limited to the 12 conditions. In Res-Comb-GA model, we limited the degree of freedom of residual fitting using known phenotype information (Res-Conds-GA), thus, the second model only fitted the unexplained residual GA (Res-CPG-GA), simply to boost the prediction. Res-Comb-GA model consisted of three models for <37, ≥37 and ≤40, and >40 weeks’ gestation estimated by normal-GA model. The model numbers and periods were also determined according to clinical knowledge and pursuing normal distribution of residual GA. The estimated delivery date falls on 40 week’s gestation. Before this date, a pregnant woman might seek termination in advance due to a medical condition. Meanwhile, a normal pregnant woman might seek for termination since the delivery date. We used three approaches, i.e., predicting residual GA during: (1) <37 weeks’ gestation only (Res-Comb-PR-GA); (2) both <37 and ≥37 and ≤40 weeks’ gestation, i.e., term before the estimated delivery date (Res-Comb-PRTB-GA); and (3) all the three periods (Res-Comb-GA).

Model evaluation

Since some probes in sub-challenge were imputed, the model accuracy would be lower using the same pipeline (Figure 3). We did not refine the probe imputation techniques. It is because our question specifically asked whether correcting collider-restriction bias added a substantial improvement in placental clock accuracy. In this subchallenge, we also used Res-Comb-GA model which was trained using 450k probes.

Performance comparison

Table S9. Model evaluation
model metric avg lb ub current_best win sub code rank task val
Normal-GA RMSE 3.004 2.994 3.014 0.956 No
Normal-GA MAE 1.935 1.927 1.943 0.721 No
Normal-GA r 0.921 0.921 0.922 0.976 No
Normal-GA (450k) RMSE 1.859 1.851 1.866 0.956 No
Normal-GA (450k) MAE 1.142 1.137 1.148 0.721 No
Normal-GA (450k) r 0.967 0.967 0.968 0.976 No
Resfull-GA* RMSE 0.738 0.736 0.740 0.956 Yes
Resfull-GA* MAE 0.497 0.496 0.499 0.721 Yes
Resfull-GA* r 0.994 0.994 0.994 0.976 Yes
Resfull-GA* (450k) RMSE 0.946 0.941 0.950 0.956 Yes 2 isitthedarkhorse 6 1 1.3552
Resfull-GA* (450k) MAE 0.613 0.611 0.615 0.721 Yes 2 isitthedarkhorse 6 1 1.073
Resfull-GA* (450k) r 0.990 0.990 0.990 0.976 Yes 2 isitthedarkhorse 6 1 0.9505
Res-Conds-GA§ RMSE 0.896 0.893 0.898 0.956 Yes
Res-Conds-GA§ MAE 0.631 0.630 0.633 0.721 Yes
Res-Conds-GA§ r 0.991 0.991 0.991 0.976 Yes
Res-Conds-GA§ (450k) RMSE 0.819 0.817 0.822 0.956 Yes 7 slidingdoors 2 1 1.0949
Res-Conds-GA§ (450k) MAE 0.551 0.550 0.553 0.721 Yes 7 slidingdoors 2 1 0.8843
Res-Conds-GA§ (450k) r 0.992 0.992 0.992 0.976 Yes 7 slidingdoors 2 1 0.9642
Res-Comb-GA RMSE 0.701 0.699 0.703 0.956 Yes
Res-Comb-GA MAE 0.496 0.495 0.498 0.721 Yes
Res-Comb-GA r 0.994 0.994 0.994 0.976 Yes
Res-Comb-GA (450k) RMSE 0.568 0.566 0.569 0.956 Yes 8 pointofdivergence 1 1 1.0772
Res-Comb-GA (450k) MAE 0.389 0.388 0.390 0.721 Yes 8 pointofdivergence 1 1 0.8876
Res-Comb-GA (450k) r 0.996 0.996 0.996 0.976 Yes 8 pointofdivergence 1 1 0.9663

## $`Normal-GA`

## 
## $`Normal-GA (450k)`

## 
## $`Resfull-GA*`

## 
## $`Resfull-GA* (450k)`

## 
## $`Res-Conds-GA§`

## 
## $`Res-Conds-GA§ (450k)`

## 
## $`Res-Comb-GA`

## 
## $`Res-Comb-GA (450k)`

Best model predictors

Model submission

Submission 1: GA res-full random forest (testthedarkhorse)

You need to generate all files that are required for the task 1 submission 2. Then, copy all pcd1_sub2/data and pcd1_sub2/inst/extdata in task 1 to pcd2_sub1/data and pcd2_sub1/inst/extdata in task 2.

Submission 2: GA res-comb-pred (450k) elastic net (pointofdivergence)

You need to generate all files that are required for the task 1 submission 7. Then, copy all pcd1_sub7/data and pcd1_sub7/inst/extdata in task 1 to pcd2_sub2/data and pcd2_sub2/inst/extdata in task 2.

Submission 3: GA res-comb-pred elastic net (imperfectmirror)